Temporal Analysis and Visualization: Leveraging Time Series Capabilities in HVT (Hierarchical Voronoi Tessellation)

Zubin Dowlaty, Pon Anureka Seenivasan, Vishwavani

2024-10-17

1. Background

The HVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data analysis. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below:

  1. Data Compression: Vector Quantization (VQ), HVQ (Hierarchical Vector Quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective.

  2. Data Projection: Dimension projection of the compressed cells to 1D, 2D and Interactive surface plot with the Sammons Non-linear Algorithm. This step creates a topology preserving map (also called as mathematical embeddings) coordinates into the desired output dimension.

  3. Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for Hierarchical Voronoi Tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map. Useful for semi-supervised tasks.

  4. Scoring: Scoring data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required.

  5. Temporal Analysis and Visualization: A Collection of functions that leverages the capacity of the HVT package by analyzing time series data for its underlying patterns, calculation of transitioning probabilities and the visualizations for the flow of data over time.

What’s New?

Below are the new functions and its brief descriptions:

2. Experimental setup

The Lorenz attractor is a three-dimensional figure that is generated by a set of differential equations that model a simple chaotic dynamic system of convective flow. Lorenz Attractor arises from a simplified set of equations that describe the behavior of a system involving three variables. These variables represent the state of the system at any given time and are typically denoted by (x, y, z). The equations are as follows:

\[ dx/dt = σ*(y-x) \] \[ dy/dt = x*(r -z)-y \] \[ dz/dt = x*y-β*z \] where dx/dt, dy/dt, and dz/dt represent the rates of change of x, y, and z respectively over time (t). σ, r and β are constant parameters of the system, with σ(σ = 10) controlling the rate of convection, r(r=28) controlling the difference in temperature between the convective and stable regions, and β(β = 8/3) representing the ratio of the width to the height of the convective layer. When these equations are plotted in three-dimensional space, they produce a chaotic trajectory that never repeats. The Lorenz attractor exhibits sensitive dependence on initial conditions, meaning even small differences in the initial conditions can lead to drastically different trajectories over time. This sensitivity to initial conditions is a defining characteristic of chaotic systems.

In this notebook, we will use the Lorenz Attractor Dataset. This dataset contains 200,000 (Two hundred thousand) observations and 5 columns. The dataset can be downloaded from here.

The dataset includes the following columns:

3. Notebook Requirements

This chunk verifies the installation of all the necessary packages to successfully run this vignette, if not, installs them and attach all the packages in the session environment.

list.of.packages <- c("dplyr", "kableExtra", "plotly", "purrr", "data.table", "gridExtra", "grid", "reactable", "reshape", "tidyr", 
                      "stringr", "DT", "knitr", "feather")

new.packages <-
  list.of.packages[!(list.of.packages %in% installed.packages()[, "Package"])]
if (length(new.packages))
  install.packages(new.packages, dependencies = TRUE, verbose = FALSE, repos='https://cloud.r-project.org/')
invisible(lapply(list.of.packages, library, character.only = TRUE))
# Sourcing required code scripts for HVT
script_dir <- "../R"
r_files <- list.files(script_dir, pattern = "\\.R$", full.names = TRUE)
invisible(lapply(r_files, function(file) { source(file, echo = FALSE); }))

4. Data Understanding

Here, we load the data. Let’s explore the Lorenz Attractor Dataset. For the sake of brevity, we are displaying only the first ten rows.

file_path <- ("./sample_dataset/lorenz_attractor.feather")
dataset <- read_feather(file_path) %>% as.data.frame()
dataset <- dataset %>% select(X,Y,Z,U,t)
dataset$t <- round(dataset$t, 5)
displayTable(head(dataset, 10))
X Y Z U t
0.0000 1.0000 20.0000 0.0000 0.0000
0.0025 0.9998 19.9867 0.0005 0.0003
0.0050 0.9995 19.9734 0.0010 0.0005
0.0075 0.9993 19.9601 0.0015 0.0008
0.0099 0.9990 19.9468 0.0020 0.0010
0.0124 0.9988 19.9335 0.0025 0.0013
0.0149 0.9986 19.9202 0.0030 0.0015
0.0173 0.9984 19.9069 0.0035 0.0018
0.0198 0.9982 19.8937 0.0040 0.0020
0.0222 0.9980 19.8804 0.0045 0.0022

Now, let’s try to visualize the Lorenz attractor (overlapping spirals) in 3D Space.

data_3d <- dataset[sample(1:nrow(dataset), 1000), ]
plot_ly(data_3d, x= ~X, y= ~Y, z = ~Z) %>% add_markers( marker = list(
                          size = 2,
                          symbol = "circle",
                          color = ~Z,
                          colorscale = "Bluered",
                          colorbar = (list(title = 'Z'))))

Figure 1: Lorenz attractor in 3D space

Now, let’s have a look at the structure of the Lorenz Attractor dataset.

str(dataset)
## 'data.frame':    200000 obs. of  5 variables:
##  $ X: num  0 0.0025 0.00499 0.00747 0.00995 ...
##  $ Y: num  1 1 1 0.999 0.999 ...
##  $ Z: num  20 20 20 20 19.9 ...
##  $ U: num  0 0.0005 0.001 0.0015 0.002 ...
##  $ t: num  0 0.00025 0.0005 0.00075 0.001 0.00125 0.0015 0.00175 0.002 0.00225 ...

Data distribution This section displays four objects.

Variable Histograms: The histogram distribution of all the features in the dataset.

Box Plots: Box plots for all the features in the dataset. These plots will display the median and Interquartile range of each column at a panel level.

Correlation Matrix: This calculates the Pearson correlation which is a bivariate correlation value measuring the linear correlation between two numeric columns. The output plot is shown as a matrix.

Summary EDA: The table provides descriptive statistics for all the features in the dataset.

Time Series Plots: Plots of all features (including time) against the time column.

It uses an inbuilt function called edaPlots to display the above-mentioned four objects.

NOTE: The input dataset should be a data frame object and the columns should be only numeric type.

edaPlots(dataset, time_column = "t", output_type = "timeseries", n_cols = 5)


edaPlots(dataset, output_type = 'summary', n_cols = 5)


edaPlots(dataset, output_type = 'histogram', n_cols = 5)


edaPlots(dataset, output_type = 'boxplot', n_cols = 5)


edaPlots(dataset, output_type = 'correlation', n_cols = 5)

Train - Test Split

Let us split the dataset into train and test. We will orderly select 80% of the data as train and the remaining as test.

noOfPoints <- dim(dataset)[1]
trainLength <- as.integer(noOfPoints * 0.8)
trainDataset <- dataset[1:trainLength,]
testDataset <- dataset[(trainLength+1):noOfPoints,]
rownames(testDataset) <- NULL

4.1 Training dataset

Let’s have a look at the Training dataset containing 160,000 data points. For the sake of brevity, we are displaying the first 10 rows.

displayTable(head(trainDataset, 10))
X Y Z U t
0.0000 1.0000 20.0000 0.0000 0.0000
0.0025 0.9998 19.9867 0.0005 0.0003
0.0050 0.9995 19.9734 0.0010 0.0005
0.0075 0.9993 19.9601 0.0015 0.0008
0.0099 0.9990 19.9468 0.0020 0.0010
0.0124 0.9988 19.9335 0.0025 0.0013
0.0149 0.9986 19.9202 0.0030 0.0015
0.0173 0.9984 19.9069 0.0035 0.0018
0.0198 0.9982 19.8937 0.0040 0.0020
0.0222 0.9980 19.8804 0.0045 0.0022

Now, let’s have a look at the structure of the training dataset.

str(trainDataset)
## 'data.frame':    160000 obs. of  5 variables:
##  $ X: num  0 0.0025 0.00499 0.00747 0.00995 ...
##  $ Y: num  1 1 1 0.999 0.999 ...
##  $ Z: num  20 20 20 20 19.9 ...
##  $ U: num  0 0.0005 0.001 0.0015 0.002 ...
##  $ t: num  0 0.00025 0.0005 0.00075 0.001 0.00125 0.0015 0.00175 0.002 0.00225 ...

Data Distribution

edaPlots(trainDataset, time_column = "t", output_type = "timeseries", n_cols = 5)


edaPlots(trainDataset, output_type = 'summary', n_cols = 5)


edaPlots(trainDataset, output_type = 'histogram', n_cols = 5)


edaPlots(trainDataset, output_type = 'boxplot', n_cols = 5)


edaPlots(trainDataset, output_type = 'correlation', n_cols = 5)

4.2 Testing dataset

Let’s have a look at the Testing dataset containing 40,000 data points. For the sake of brevity, we are displaying first the 10 rows.

displayTable(head(testDataset,10))
X Y Z U t
16.0583 13.6588 39.5994 9.8935 40.0002
16.0523 13.6088 39.6278 9.8935 40.0004
16.0461 13.5587 39.6558 9.8934 40.0007
16.0399 13.5085 39.6837 9.8933 40.0010
16.0335 13.4582 39.7113 9.8932 40.0012
16.0270 13.4079 39.7386 9.8932 40.0014
16.0204 13.3575 39.7657 9.8931 40.0017
16.0137 13.3070 39.7926 9.8930 40.0020
16.0068 13.2564 39.8192 9.8929 40.0022
15.9999 13.2057 39.8456 9.8929 40.0025

Now, let’s have a look at the structure of the testing dataset.

str(testDataset)
## 'data.frame':    40000 obs. of  5 variables:
##  $ X: num  16.1 16.1 16 16 16 ...
##  $ Y: num  13.7 13.6 13.6 13.5 13.5 ...
##  $ Z: num  39.6 39.6 39.7 39.7 39.7 ...
##  $ U: num  9.89 9.89 9.89 9.89 9.89 ...
##  $ t: num  40 40 40 40 40 ...

Data Distribution

edaPlots(testDataset, time_column = "t", output_type = "timeseries", n_cols = 5)


edaPlots(testDataset, output_type = 'summary', n_cols = 5)


edaPlots(testDataset, output_type = 'histogram', n_cols = 5)


edaPlots(testDataset, output_type = 'boxplot', n_cols = 5)


edaPlots(testDataset, output_type = 'correlation', n_cols = 5)

5. Model Training and Visualization

We will use the trainHVT function to compress our dataset while preserving essential features.

Model Parameters

NOTE: The compression takes place only for the X, Y and Z coordinates and not for U(velocity) and t(Timestamp). After training & Scoring, we merge back the U and t columns with the dataset.

hvt.results <- trainHVT(
  trainDataset[,-c(4:5)],
  n_cells = 100,
  depth = 1,
  quant.err = 0.1,
  normalize = TRUE,
  distance_metric = "L1_Norm",
  error_metric = "max",
  quant_method = "kmeans",
  dim_reduction_method = "sammon")
## Initial stress        : 0.00301
## stress after  10 iters: 0.00155, magic = 0.500
## stress after  20 iters: 0.00154, magic = 0.500

Let’s check out the compression summary.

displayTable(data = hvt.results[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 100 0 0 n_cells: 100 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans

NOTE: Based on the provided table, it’s evident that the ‘percentOfCellsBelowQuantizationErrorThreshold’ value is zero, indicating that compression hasn’t taken place for the specified number of cells, which is 100. Typically, we would continue increasing the number of cells until at least 80% compression occurs. However, in this vignette demonstration, we’re not doing so, because the plots generated from temporal analysis functions would become cluttered and complex, making explanations less clear.

Let’s check out the model summary from trainHVT().

hvt.results$model_info$input_parameters
## $input_dataset
## [1] "160000 Rows & 3 Columns"
## 
## $n_cells
## [1] 100
## 
## $depth
## [1] 1
## 
## $quant.err
## [1] 0.1
## 
## $normalize
## [1] TRUE
## 
## $distance_metric
## [1] "L1_Norm"
## 
## $error_metric
## [1] "max"
## 
## $quant_method
## [1] "kmeans"
## 
## $diagnose
## [1] FALSE
## 
## $projection.scale
## [1] 10
## 
## $hvt_validation
## [1] FALSE
## 
## $train_validation_split_ratio
## [1] 0.8

Now, Let’s plot the Voronoi tessellation for 100 cells.

plotHVT(
  hvt.results,
  centroid.size = c(0.6),
  plot.type = '2Dhvt',
  cell_id = FALSE)
Figure 2: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’Lorenz attractor’

Figure 2: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’Lorenz attractor’

To understand how cell IDs are distributed across the map, we again plot Voronoi tessellation with cell_id = TRUE.

plotHVT(
  hvt.results,
  centroid.size = c(0.6),
  plot.type = '2Dhvt',
  cell_id = TRUE)
Figure 3: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’Lorenz attractor’ with Cell ID

Figure 3: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’Lorenz attractor’ with Cell ID

6. Scoring

Now, once we have built the model, let us try to score using the model.

NOTE: we are using the training dataset plus the testing dataset i.e., the raw dataset here for the demo purpose.

set.seed(240)
scoring <- scoreHVT(dataset,
                    hvt.results,
                    child.level = 1)

Let’s see the output of scoreHVT() which has scored the cells for all the data points in the dataset. For the sake of brevity, we will only show the first 100 rows.

In the displayTable function, the ‘columnName’ argument takes any of the column names that have to be highlighted greater than the given ‘value’ based on the provided data.

displayTable(scoring$scoredPredictedData)
Segment.Level Segment.Parent Segment.Child n Cell.ID Quant.Error centroidRadius diff anomalyFlag X Y Z
1 1 89 1 55 0.0828 0.3469 0.2641 0 -0.1084 0.0120 -0.3780
1 1 89 1 55 0.0821 0.3469 0.2647 0 -0.1081 0.0120 -0.3795
1 1 89 1 55 0.0815 0.3469 0.2653 0 -0.1078 0.0120 -0.3810
1 1 89 1 55 0.0809 0.3469 0.2660 0 -0.1075 0.0120 -0.3826
1 1 89 1 55 0.0803 0.3469 0.2666 0 -0.1072 0.0119 -0.3841
1 1 89 1 55 0.0797 0.3469 0.2672 0 -0.1068 0.0119 -0.3856
1 1 89 1 55 0.079 0.3469 0.2678 0 -0.1065 0.0119 -0.3871
1 1 89 1 55 0.0784 0.3469 0.2684 0 -0.1062 0.0119 -0.3886
1 1 89 1 55 0.0778 0.3469 0.2691 0 -0.1059 0.0118 -0.3901
1 1 89 1 55 0.0772 0.3469 0.2697 0 -0.1056 0.0118 -0.3916
1 1 89 1 55 0.0766 0.3469 0.2703 0 -0.1053 0.0118 -0.3931
1 1 89 1 55 0.076 0.3469 0.2709 0 -0.1050 0.0118 -0.3947
1 1 89 1 55 0.0754 0.3469 0.2715 0 -0.1047 0.0117 -0.3962
1 1 89 1 55 0.0747 0.3469 0.2721 0 -0.1044 0.0117 -0.3977
1 1 89 1 55 0.0741 0.3469 0.2727 0 -0.1040 0.0117 -0.3992
1 1 89 1 55 0.0735 0.3469 0.2733 0 -0.1037 0.0117 -0.4007
1 1 89 1 55 0.0729 0.3469 0.2739 0 -0.1034 0.0117 -0.4022
1 1 89 1 55 0.0723 0.3469 0.2746 0 -0.1031 0.0116 -0.4037
1 1 89 1 55 0.0717 0.3469 0.2752 0 -0.1028 0.0116 -0.4052
1 1 89 1 55 0.0711 0.3469 0.2758 0 -0.1025 0.0116 -0.4067
1 1 89 1 55 0.0705 0.3469 0.2764 0 -0.1022 0.0116 -0.4082
1 1 89 1 55 0.0699 0.3469 0.2770 0 -0.1019 0.0116 -0.4097
1 1 89 1 55 0.0693 0.3469 0.2776 0 -0.1016 0.0116 -0.4112
1 1 89 1 55 0.0687 0.3469 0.2782 0 -0.1013 0.0115 -0.4127
1 1 89 1 55 0.0681 0.3469 0.2788 0 -0.1010 0.0115 -0.4142
1 1 89 1 55 0.0675 0.3469 0.2794 0 -0.1007 0.0115 -0.4156
1 1 89 1 55 0.0669 0.3469 0.2800 0 -0.1004 0.0115 -0.4171
1 1 89 1 55 0.0663 0.3469 0.2806 0 -0.1001 0.0115 -0.4186
1 1 89 1 55 0.0657 0.3469 0.2812 0 -0.0998 0.0115 -0.4201
1 1 89 1 55 0.0651 0.3469 0.2818 0 -0.0995 0.0115 -0.4216
1 1 89 1 55 0.0645 0.3469 0.2824 0 -0.0992 0.0115 -0.4231
1 1 89 1 55 0.0639 0.3469 0.2830 0 -0.0989 0.0114 -0.4246
1 1 89 1 55 0.0633 0.3469 0.2836 0 -0.0987 0.0114 -0.4261
1 1 89 1 55 0.0627 0.3469 0.2842 0 -0.0984 0.0114 -0.4275
1 1 89 1 55 0.0621 0.3469 0.2848 0 -0.0981 0.0114 -0.4290
1 1 89 1 55 0.0615 0.3469 0.2854 0 -0.0978 0.0114 -0.4305
1 1 89 1 55 0.0609 0.3469 0.2860 0 -0.0975 0.0114 -0.4320
1 1 89 1 55 0.0603 0.3469 0.2865 0 -0.0972 0.0114 -0.4335
1 1 89 1 55 0.0597 0.3469 0.2871 0 -0.0969 0.0114 -0.4350
1 1 89 1 55 0.0591 0.3469 0.2877 0 -0.0966 0.0114 -0.4364
1 1 89 1 55 0.0585 0.3469 0.2883 0 -0.0963 0.0114 -0.4379
1 1 89 1 55 0.058 0.3469 0.2889 0 -0.0961 0.0114 -0.4394
1 1 89 1 55 0.0574 0.3469 0.2895 0 -0.0958 0.0114 -0.4409
1 1 89 1 55 0.0568 0.3469 0.2901 0 -0.0955 0.0114 -0.4423
1 1 89 1 55 0.0562 0.3469 0.2907 0 -0.0952 0.0114 -0.4438
1 1 89 1 55 0.0556 0.3469 0.2913 0 -0.0949 0.0114 -0.4453
1 1 89 1 55 0.055 0.3469 0.2918 0 -0.0946 0.0114 -0.4467
1 1 89 1 55 0.0544 0.3469 0.2924 0 -0.0944 0.0114 -0.4482
1 1 89 1 55 0.0539 0.3469 0.2930 0 -0.0941 0.0114 -0.4497
1 1 89 1 55 0.0533 0.3469 0.2936 0 -0.0938 0.0114 -0.4512
1 1 89 1 55 0.0527 0.3469 0.2942 0 -0.0935 0.0114 -0.4526
1 1 89 1 55 0.0521 0.3469 0.2948 0 -0.0932 0.0114 -0.4541
1 1 89 1 55 0.0515 0.3469 0.2953 0 -0.0930 0.0114 -0.4556
1 1 89 1 55 0.0509 0.3469 0.2959 0 -0.0927 0.0114 -0.4570
1 1 89 1 55 0.0504 0.3469 0.2965 0 -0.0924 0.0114 -0.4585
1 1 89 1 55 0.0498 0.3469 0.2971 0 -0.0921 0.0114 -0.4599
1 1 89 1 55 0.0492 0.3469 0.2976 0 -0.0919 0.0114 -0.4614
1 1 89 1 55 0.0486 0.3469 0.2982 0 -0.0916 0.0114 -0.4629
1 1 89 1 55 0.0481 0.3469 0.2988 0 -0.0913 0.0114 -0.4643
1 1 89 1 55 0.0475 0.3469 0.2994 0 -0.0910 0.0114 -0.4658
1 1 89 1 55 0.0469 0.3469 0.2999 0 -0.0908 0.0114 -0.4672
1 1 89 1 55 0.0463 0.3469 0.3005 0 -0.0905 0.0114 -0.4687
1 1 89 1 55 0.0458 0.3469 0.3011 0 -0.0902 0.0114 -0.4701
1 1 89 1 55 0.0452 0.3469 0.3017 0 -0.0899 0.0114 -0.4716
1 1 89 1 55 0.0446 0.3469 0.3022 0 -0.0897 0.0114 -0.4730
1 1 89 1 55 0.044 0.3469 0.3028 0 -0.0894 0.0114 -0.4745
1 1 89 1 55 0.0435 0.3469 0.3034 0 -0.0891 0.0114 -0.4759
1 1 89 1 55 0.0429 0.3469 0.3039 0 -0.0889 0.0114 -0.4774
1 1 89 1 55 0.0423 0.3469 0.3045 0 -0.0886 0.0115 -0.4788
1 1 89 1 55 0.0418 0.3469 0.3051 0 -0.0883 0.0115 -0.4803
1 1 89 1 55 0.0412 0.3469 0.3056 0 -0.0881 0.0115 -0.4817
1 1 89 1 55 0.0406 0.3469 0.3062 0 -0.0878 0.0115 -0.4832
1 1 89 1 55 0.0401 0.3469 0.3068 0 -0.0875 0.0115 -0.4846
1 1 89 1 55 0.0395 0.3469 0.3073 0 -0.0873 0.0115 -0.4861
1 1 89 1 55 0.0389 0.3469 0.3079 0 -0.0870 0.0115 -0.4875
1 1 89 1 55 0.0384 0.3469 0.3085 0 -0.0867 0.0115 -0.4889
1 1 89 1 55 0.0378 0.3469 0.3090 0 -0.0865 0.0116 -0.4904
1 1 89 1 55 0.0373 0.3469 0.3096 0 -0.0862 0.0116 -0.4918
1 1 89 1 55 0.0367 0.3469 0.3102 0 -0.0860 0.0116 -0.4933
1 1 89 1 55 0.0361 0.3469 0.3107 0 -0.0857 0.0116 -0.4947
1 1 89 1 55 0.0356 0.3469 0.3113 0 -0.0854 0.0116 -0.4961
1 1 89 1 55 0.035 0.3469 0.3118 0 -0.0852 0.0116 -0.4976
1 1 89 1 55 0.0345 0.3469 0.3124 0 -0.0849 0.0117 -0.4990
1 1 89 1 55 0.0339 0.3469 0.3130 0 -0.0847 0.0117 -0.5004
1 1 89 1 55 0.0333 0.3469 0.3135 0 -0.0844 0.0117 -0.5019
1 1 89 1 55 0.0328 0.3469 0.3141 0 -0.0841 0.0117 -0.5033
1 1 89 1 55 0.0322 0.3469 0.3146 0 -0.0839 0.0117 -0.5047
1 1 89 1 55 0.0317 0.3469 0.3152 0 -0.0836 0.0118 -0.5062
1 1 89 1 55 0.0311 0.3469 0.3157 0 -0.0834 0.0118 -0.5076
1 1 89 1 55 0.0306 0.3469 0.3163 0 -0.0831 0.0118 -0.5090
1 1 89 1 55 0.03 0.3469 0.3168 0 -0.0829 0.0118 -0.5104
1 1 89 1 55 0.0295 0.3469 0.3174 0 -0.0826 0.0119 -0.5119
1 1 89 1 55 0.0289 0.3469 0.3179 0 -0.0824 0.0119 -0.5133
1 1 89 1 55 0.0284 0.3469 0.3185 0 -0.0821 0.0119 -0.5147
1 1 89 1 55 0.0278 0.3469 0.3190 0 -0.0819 0.0119 -0.5161
1 1 89 1 55 0.0273 0.3469 0.3196 0 -0.0816 0.0120 -0.5175
1 1 89 1 55 0.0267 0.3469 0.3201 0 -0.0814 0.0120 -0.5190
1 1 89 1 55 0.0262 0.3469 0.3207 0 -0.0811 0.0120 -0.5204
1 1 89 1 55 0.0256 0.3469 0.3212 0 -0.0809 0.0120 -0.5218
1 1 89 1 55 0.0251 0.3469 0.3218 0 -0.0806 0.0121 -0.5232

Let’s look at the scored model summary from scoreHVT()

The model info displays five attributes which are explained below:

  1. Dimension of the dataset that is used in scoring.
  2. The range of ‘Quantization Error’ after scoring of all the data points of the given dataset. (from minimum to maximum)
  3. The value of Mean Absolute Deviation(MAD) Threshold.
  4. The value of how many data points are anomalous. If the quantization error of a data point is greater than the ‘mad.threshold’, then it is considered as anomaly.
  5. The value of how much cells have anomaly. If a cell has even 1 anomalous data point, the cell will be considered as anomaly_cell.
scoring$model_info$scored_model_summary
## $input_dataset
## [1] "200000 Rows & 5 Columns"
## 
## $scored_qe_range
## [1] "4e-04 to 0.3021"
## 
## $mad.threshold
## [1] 0.2
## 
## $no_of_anomaly_datapoints
## [1] 1353
## 
## $no_of_anomaly_cells
## [1] 17

Now, let’s merge back the U and t columns from the dataset to the scoring output and prepare the dataset for ‘Temporal Analysis’ Functions.

temporal_data <- cbind(scoring$scoredPredictedData, dataset[,c(4,5)]) %>% select(Cell.ID,t)


7. Timeseries plot with State Transitions

Let’s comprehend the function plotStateTransition which is used to create a time series plotly object.

plotStateTransition(
       df,
       sample_size,
       line_plot,
       cellid_column,
       time_column)
plotStateTransition(df = temporal_data, 
                    cellid_column = "Cell.ID", 
                    time_column = "t",
                    sample_size = 0.2)

For the demo of ‘sample_size’ argument, we are replicating the same plot above with the entire dataset.

plotStateTransition(df = temporal_data, 
                    cellid_column = "Cell.ID", 
                    time_column = "t",
                    sample_size = 1)


8. Transition probability tables

getTransitionProbability(
        df, 
        cellid_column, 
        time_column)
trans_table <- getTransitionProbability(df = temporal_data, 
                                        cellid_column = "Cell.ID", 
                                        time_column = "t")

NOTE: The output is stored as a nested list. For the purpose of demo, here we are displaying it as data frame with the first 10 rows.

combined_df <- do.call(rbind, trans_table)
displayTable(head(combined_df,10))
Current_State Next_State Relative_Frequency Transition_Probability
1 1 1161 0.9856
1 2 15 0.0127
1 7 2 0.0017
2 2 1382 0.9864
2 5 16 0.0114
2 10 3 0.0021
3 1 16 0.0097
3 3 1618 0.9842
3 7 10 0.0061
4 3 22 0.0131

9. Reconciling transition probability using markovchain package

Reconciling the transition probability of the current states to the next states manually and markovchain function considering self states and without self states.

reconcileTransitionProbability(
                df, 
                hmap_type = "All", 
                cellid_column, 
                time_column)
reconcile_plots <- reconcileTransitionProbability(df = temporal_data, 
                                                  hmap_type = "All", 
                                                  cellid_column = "Cell.ID",
                                                  time_column = "t")

Reconciliation plots of transition probability with self_state

The transition probability of one state staying in the same state is calculated using manual calculations and the markovchain function is plotted for comparison. The darker diagonal cells indicate higher probabilities of cells staying in the same state.

reconcile_plots[[1]]

Reconciliation table of transition probability with self-state

displayTable(reconcile_plots[[2]], limit = 217)
Current_State Next_State_manual Next_State_markov Probability_manual_calculation Probability_markov_function diff
1 2 2 0.0127 0.0127 0
1 7 7 0.0017 0.0017 0
2 5 5 0.0114 0.0114 0
2 10 10 0.0021 0.0021 0
3 1 1 0.0097 0.0097 0
3 7 7 0.0061 0.0061 0
4 3 3 0.0131 0.0131 0
4 9 9 0.0024 0.0024 0
5 10 10 0.0008 0.0008 0
5 11 11 0.0116 0.0116 0
6 4 4 0.0144 0.0144 0
6 14 14 0.0006 0.0006 0
7 1 1 0.0006 0.0006 0
7 2 2 0.0023 0.0023 0
7 10 10 0.0053 0.0053 0
7 15 15 0.0012 0.0012 0
8 6 6 0.0148 0.0148 0
8 13 13 0.0009 0.0009 0
9 3 3 0.0026 0.0026 0
9 7 7 0.0026 0.0026 0
9 15 15 0.0039 0.0039 0
10 11 11 0.0021 0.0021 0
10 19 19 0.0053 0.0053 0
10 20 20 0.0005 0.0005 0
11 19 19 0.0032 0.0032 0
11 21 21 0.0089 0.0089 0
12 8 8 0.0100 0.0100 0
12 13 13 0.0036 0.0036 0
13 6 6 0.0049 0.0049 0
13 8 8 0.0025 0.0025 0
13 14 14 0.0031 0.0031 0
13 18 18 0.0006 0.0006 0
14 4 4 0.0013 0.0013 0
14 9 9 0.0065 0.0065 0
15 10 10 0.0017 0.0017 0
15 20 20 0.0050 0.0050 0
16 12 12 0.0102 0.0102 0
16 17 17 0.0007 0.0007 0
17 12 12 0.0023 0.0023 0
17 13 13 0.0040 0.0040 0
17 18 18 0.0023 0.0023 0
18 13 13 0.0038 0.0038 0
18 14 14 0.0046 0.0046 0
19 21 21 0.0022 0.0022 0
19 26 26 0.0033 0.0033 0
19 27 27 0.0033 0.0033 0
20 19 19 0.0008 0.0008 0
20 26 26 0.0051 0.0051 0
21 27 27 0.0124 0.0124 0
22 16 16 0.0102 0.0102 0
22 23 23 0.0006 0.0006 0
23 17 17 0.0073 0.0073 0
23 22 22 0.0005 0.0005 0
24 18 18 0.0043 0.0043 0
24 23 23 0.0007 0.0007 0
25 22 22 0.0101 0.0101 0
25 28 28 0.0007 0.0007 0
26 27 27 0.0006 0.0006 0
26 32 32 0.0056 0.0056 0
26 33 33 0.0011 0.0011 0
27 26 26 0.0004 0.0004 0
27 33 33 0.0106 0.0106 0
28 22 22 0.0005 0.0005 0
28 23 23 0.0047 0.0047 0
28 25 25 0.0028 0.0028 0
29 23 23 0.0018 0.0018 0
29 24 24 0.0043 0.0043 0
30 25 25 0.0052 0.0052 0
30 28 28 0.0042 0.0042 0
31 29 29 0.0034 0.0034 0
32 37 37 0.0027 0.0027 0
32 40 40 0.0048 0.0048 0
33 32 32 0.0016 0.0016 0
33 40 40 0.0012 0.0012 0
33 44 44 0.0077 0.0077 0
34 28 28 0.0032 0.0032 0
34 30 30 0.0032 0.0032 0
35 30 30 0.0050 0.0050 0
35 34 34 0.0025 0.0025 0
36 29 29 0.0031 0.0031 0
36 31 31 0.0006 0.0006 0
36 34 34 0.0013 0.0013 0
37 39 39 0.0035 0.0035 0
37 48 48 0.0017 0.0017 0
38 31 31 0.0011 0.0011 0
38 36 36 0.0033 0.0033 0
39 31 31 0.0010 0.0010 0
39 38 38 0.0019 0.0019 0
39 46 46 0.0014 0.0014 0
40 37 37 0.0018 0.0018 0
40 48 48 0.0036 0.0036 0
40 56 56 0.0036 0.0036 0
41 34 34 0.0020 0.0020 0
41 35 35 0.0024 0.0024 0
42 35 35 0.0050 0.0050 0
43 34 34 0.0014 0.0014 0
43 36 36 0.0007 0.0007 0
43 41 41 0.0021 0.0021 0
44 40 40 0.0042 0.0042 0
44 56 56 0.0058 0.0058 0
45 39 39 0.0024 0.0024 0
45 53 53 0.0024 0.0024 0
46 38 38 0.0019 0.0019 0
46 47 47 0.0033 0.0033 0
47 43 43 0.0034 0.0034 0
47 52 52 0.0010 0.0010 0
48 50 50 0.0059 0.0059 0
48 58 58 0.0017 0.0017 0
49 42 42 0.0040 0.0040 0
50 46 46 0.0015 0.0015 0
50 54 54 0.0037 0.0037 0
50 62 62 0.0011 0.0011 0
51 41 41 0.0016 0.0016 0
51 49 49 0.0030 0.0030 0
52 43 43 0.0006 0.0006 0
52 51 51 0.0042 0.0042 0
52 63 63 0.0009 0.0009 0
53 39 39 0.0003 0.0003 0
53 46 46 0.0013 0.0013 0
53 54 54 0.0036 0.0036 0
54 47 47 0.0008 0.0008 0
54 55 55 0.0045 0.0045 0
54 60 60 0.0011 0.0011 0
55 47 47 0.0009 0.0009 0
55 52 52 0.0040 0.0040 0
55 64 64 0.0009 0.0009 0
56 48 48 0.0031 0.0031 0
56 58 58 0.0054 0.0054 0
57 45 45 0.0008 0.0008 0
57 53 53 0.0056 0.0056 0
58 50 50 0.0012 0.0012 0
58 62 62 0.0050 0.0050 0
59 45 45 0.0009 0.0009 0
59 57 57 0.0066 0.0066 0
59 65 65 0.0009 0.0009 0
60 55 55 0.0005 0.0005 0
60 64 64 0.0043 0.0043 0
61 60 60 0.0036 0.0036 0
61 66 66 0.0008 0.0008 0
62 54 54 0.0007 0.0007 0
62 60 60 0.0017 0.0017 0
62 66 66 0.0030 0.0030 0
63 67 67 0.0011 0.0011 0
63 68 68 0.0004 0.0004 0
63 72 72 0.0008 0.0008 0
64 52 52 0.0007 0.0007 0
64 63 63 0.0004 0.0004 0
64 68 68 0.0031 0.0031 0
65 57 57 0.0009 0.0009 0
65 61 61 0.0051 0.0051 0
66 71 71 0.0040 0.0040 0
67 72 72 0.0006 0.0006 0
67 76 76 0.0011 0.0011 0
68 63 63 0.0003 0.0003 0
68 72 72 0.0026 0.0026 0
68 73 73 0.0010 0.0010 0
69 59 59 0.0053 0.0053 0
69 70 70 0.0043 0.0043 0
70 59 59 0.0036 0.0036 0
70 65 65 0.0050 0.0050 0
71 74 74 0.0039 0.0039 0
72 73 73 0.0033 0.0033 0
72 76 76 0.0008 0.0008 0
72 77 77 0.0012 0.0012 0
73 77 77 0.0033 0.0033 0
73 80 80 0.0016 0.0016 0
74 79 79 0.0041 0.0041 0
75 69 69 0.0055 0.0055 0
75 70 70 0.0055 0.0055 0
76 77 77 0.0029 0.0029 0
76 82 82 0.0010 0.0010 0
77 80 80 0.0050 0.0050 0
77 82 82 0.0020 0.0020 0
78 69 69 0.0040 0.0040 0
78 75 75 0.0063 0.0063 0
79 83 83 0.0050 0.0050 0
80 84 84 0.0057 0.0057 0
81 75 75 0.0061 0.0061 0
81 78 78 0.0056 0.0056 0
82 84 84 0.0047 0.0047 0
83 86 86 0.0058 0.0058 0
83 88 88 0.0016 0.0016 0
84 83 83 0.0014 0.0014 0
84 88 88 0.0072 0.0072 0
85 78 78 0.0047 0.0047 0
85 81 81 0.0058 0.0058 0
86 89 89 0.0062 0.0062 0
86 90 90 0.0040 0.0040 0
87 81 81 0.0067 0.0067 0
87 85 85 0.0037 0.0037 0
88 86 86 0.0037 0.0037 0
88 90 90 0.0059 0.0059 0
89 93 93 0.0068 0.0068 0
89 96 96 0.0049 0.0049 0
90 89 89 0.0049 0.0049 0
90 96 96 0.0061 0.0061 0
91 85 85 0.0074 0.0074 0
91 87 87 0.0037 0.0037 0
92 87 87 0.0066 0.0066 0
92 91 91 0.0024 0.0024 0
92 97 97 0.0006 0.0006 0
93 95 95 0.0069 0.0069 0
93 98 98 0.0063 0.0063 0
94 92 92 0.0074 0.0074 0
94 99 99 0.0027 0.0027 0
95 94 94 0.0074 0.0074 0
95 100 100 0.0054 0.0054 0
96 93 93 0.0069 0.0069 0
96 98 98 0.0056 0.0056 0
97 91 91 0.0112 0.0112 0
97 92 92 0.0016 0.0016 0
98 95 95 0.0057 0.0057 0
98 100 100 0.0071 0.0071 0
99 92 92 0.0019 0.0019 0
99 97 97 0.0094 0.0094 0
100 94 94 0.0027 0.0027 0
100 99 99 0.0094 0.0094 0

Reconciliation plots of transition probability without self-state

The transition probability of one state moving to the next state is calculated using manual calculations and the markovchain function and plotted for comparison. From all the next state transitions, the one with a higher probability is selected.

reconcile_plots[[3]]

Reconciliation table of transition probability without self-state

displayTable(reconcile_plots[[4]], limit = 217)
Current_State Next_State_manual Next_State_markov Probability_manual_calculation Probability_markov_function diff
1 2 2 0.0127 0.0127 0
1 7 7 0.0017 0.0017 0
2 5 5 0.0114 0.0114 0
2 10 10 0.0021 0.0021 0
3 1 1 0.0097 0.0097 0
3 7 7 0.0061 0.0061 0
4 3 3 0.0131 0.0131 0
4 9 9 0.0024 0.0024 0
5 10 10 0.0008 0.0008 0
5 11 11 0.0116 0.0116 0
6 4 4 0.0144 0.0144 0
6 14 14 0.0006 0.0006 0
7 1 1 0.0006 0.0006 0
7 2 2 0.0023 0.0023 0
7 10 10 0.0053 0.0053 0
7 15 15 0.0012 0.0012 0
8 6 6 0.0148 0.0148 0
8 13 13 0.0009 0.0009 0
9 3 3 0.0026 0.0026 0
9 7 7 0.0026 0.0026 0
9 15 15 0.0039 0.0039 0
10 11 11 0.0021 0.0021 0
10 19 19 0.0053 0.0053 0
10 20 20 0.0005 0.0005 0
11 19 19 0.0032 0.0032 0
11 21 21 0.0089 0.0089 0
12 8 8 0.0100 0.0100 0
12 13 13 0.0036 0.0036 0
13 6 6 0.0049 0.0049 0
13 8 8 0.0025 0.0025 0
13 14 14 0.0031 0.0031 0
13 18 18 0.0006 0.0006 0
14 4 4 0.0013 0.0013 0
14 9 9 0.0065 0.0065 0
15 10 10 0.0017 0.0017 0
15 20 20 0.0050 0.0050 0
16 12 12 0.0102 0.0102 0
16 17 17 0.0007 0.0007 0
17 12 12 0.0023 0.0023 0
17 13 13 0.0040 0.0040 0
17 18 18 0.0023 0.0023 0
18 13 13 0.0038 0.0038 0
18 14 14 0.0046 0.0046 0
19 21 21 0.0022 0.0022 0
19 26 26 0.0033 0.0033 0
19 27 27 0.0033 0.0033 0
20 19 19 0.0008 0.0008 0
20 26 26 0.0051 0.0051 0
21 27 27 0.0124 0.0124 0
22 16 16 0.0102 0.0102 0
22 23 23 0.0006 0.0006 0
23 17 17 0.0073 0.0073 0
23 22 22 0.0005 0.0005 0
24 18 18 0.0043 0.0043 0
24 23 23 0.0007 0.0007 0
25 22 22 0.0101 0.0101 0
25 28 28 0.0007 0.0007 0
26 27 27 0.0006 0.0006 0
26 32 32 0.0056 0.0056 0
26 33 33 0.0011 0.0011 0
27 26 26 0.0004 0.0004 0
27 33 33 0.0106 0.0106 0
28 22 22 0.0005 0.0005 0
28 23 23 0.0047 0.0047 0
28 25 25 0.0028 0.0028 0
29 23 23 0.0018 0.0018 0
29 24 24 0.0043 0.0043 0
30 25 25 0.0052 0.0052 0
30 28 28 0.0042 0.0042 0
31 29 29 0.0034 0.0034 0
32 37 37 0.0027 0.0027 0
32 40 40 0.0048 0.0048 0
33 32 32 0.0016 0.0016 0
33 40 40 0.0012 0.0012 0
33 44 44 0.0077 0.0077 0
34 28 28 0.0032 0.0032 0
34 30 30 0.0032 0.0032 0
35 30 30 0.0050 0.0050 0
35 34 34 0.0025 0.0025 0
36 29 29 0.0031 0.0031 0
36 31 31 0.0006 0.0006 0
36 34 34 0.0013 0.0013 0
37 39 39 0.0035 0.0035 0
37 48 48 0.0017 0.0017 0
38 31 31 0.0011 0.0011 0
38 36 36 0.0033 0.0033 0
39 31 31 0.0010 0.0010 0
39 38 38 0.0019 0.0019 0
39 46 46 0.0014 0.0014 0
40 37 37 0.0018 0.0018 0
40 48 48 0.0036 0.0036 0
40 56 56 0.0036 0.0036 0
41 34 34 0.0020 0.0020 0
41 35 35 0.0024 0.0024 0
42 35 35 0.0050 0.0050 0
43 34 34 0.0014 0.0014 0
43 36 36 0.0007 0.0007 0
43 41 41 0.0021 0.0021 0
44 40 40 0.0042 0.0042 0
44 56 56 0.0058 0.0058 0
45 39 39 0.0024 0.0024 0
45 53 53 0.0024 0.0024 0
46 38 38 0.0019 0.0019 0
46 47 47 0.0033 0.0033 0
47 43 43 0.0034 0.0034 0
47 52 52 0.0010 0.0010 0
48 50 50 0.0059 0.0059 0
48 58 58 0.0017 0.0017 0
49 42 42 0.0040 0.0040 0
50 46 46 0.0015 0.0015 0
50 54 54 0.0037 0.0037 0
50 62 62 0.0011 0.0011 0
51 41 41 0.0016 0.0016 0
51 49 49 0.0030 0.0030 0
52 43 43 0.0006 0.0006 0
52 51 51 0.0042 0.0042 0
52 63 63 0.0009 0.0009 0
53 39 39 0.0003 0.0003 0
53 46 46 0.0013 0.0013 0
53 54 54 0.0036 0.0036 0
54 47 47 0.0008 0.0008 0
54 55 55 0.0045 0.0045 0
54 60 60 0.0011 0.0011 0
55 47 47 0.0009 0.0009 0
55 52 52 0.0040 0.0040 0
55 64 64 0.0009 0.0009 0
56 48 48 0.0031 0.0031 0
56 58 58 0.0054 0.0054 0
57 45 45 0.0008 0.0008 0
57 53 53 0.0056 0.0056 0
58 50 50 0.0012 0.0012 0
58 62 62 0.0050 0.0050 0
59 45 45 0.0009 0.0009 0
59 57 57 0.0066 0.0066 0
59 65 65 0.0009 0.0009 0
60 55 55 0.0005 0.0005 0
60 64 64 0.0043 0.0043 0
61 60 60 0.0036 0.0036 0
61 66 66 0.0008 0.0008 0
62 54 54 0.0007 0.0007 0
62 60 60 0.0017 0.0017 0
62 66 66 0.0030 0.0030 0
63 67 67 0.0011 0.0011 0
63 68 68 0.0004 0.0004 0
63 72 72 0.0008 0.0008 0
64 52 52 0.0007 0.0007 0
64 63 63 0.0004 0.0004 0
64 68 68 0.0031 0.0031 0
65 57 57 0.0009 0.0009 0
65 61 61 0.0051 0.0051 0
66 71 71 0.0040 0.0040 0
67 72 72 0.0006 0.0006 0
67 76 76 0.0011 0.0011 0
68 63 63 0.0003 0.0003 0
68 72 72 0.0026 0.0026 0
68 73 73 0.0010 0.0010 0
69 59 59 0.0053 0.0053 0
69 70 70 0.0043 0.0043 0
70 59 59 0.0036 0.0036 0
70 65 65 0.0050 0.0050 0
71 74 74 0.0039 0.0039 0
72 73 73 0.0033 0.0033 0
72 76 76 0.0008 0.0008 0
72 77 77 0.0012 0.0012 0
73 77 77 0.0033 0.0033 0
73 80 80 0.0016 0.0016 0
74 79 79 0.0041 0.0041 0
75 69 69 0.0055 0.0055 0
75 70 70 0.0055 0.0055 0
76 77 77 0.0029 0.0029 0
76 82 82 0.0010 0.0010 0
77 80 80 0.0050 0.0050 0
77 82 82 0.0020 0.0020 0
78 69 69 0.0040 0.0040 0
78 75 75 0.0063 0.0063 0
79 83 83 0.0050 0.0050 0
80 84 84 0.0057 0.0057 0
81 75 75 0.0061 0.0061 0
81 78 78 0.0056 0.0056 0
82 84 84 0.0047 0.0047 0
83 86 86 0.0058 0.0058 0
83 88 88 0.0016 0.0016 0
84 83 83 0.0014 0.0014 0
84 88 88 0.0072 0.0072 0
85 78 78 0.0047 0.0047 0
85 81 81 0.0058 0.0058 0
86 89 89 0.0062 0.0062 0
86 90 90 0.0040 0.0040 0
87 81 81 0.0067 0.0067 0
87 85 85 0.0037 0.0037 0
88 86 86 0.0037 0.0037 0
88 90 90 0.0059 0.0059 0
89 93 93 0.0068 0.0068 0
89 96 96 0.0049 0.0049 0
90 89 89 0.0049 0.0049 0
90 96 96 0.0061 0.0061 0
91 85 85 0.0074 0.0074 0
91 87 87 0.0037 0.0037 0
92 87 87 0.0066 0.0066 0
92 91 91 0.0024 0.0024 0
92 97 97 0.0006 0.0006 0
93 95 95 0.0069 0.0069 0
93 98 98 0.0063 0.0063 0
94 92 92 0.0074 0.0074 0
94 99 99 0.0027 0.0027 0
95 94 94 0.0074 0.0074 0
95 100 100 0.0054 0.0054 0
96 93 93 0.0069 0.0069 0
96 98 98 0.0056 0.0056 0
97 91 91 0.0112 0.0112 0
97 92 92 0.0016 0.0016 0
98 95 95 0.0057 0.0057 0
98 100 100 0.0071 0.0071 0
99 92 92 0.0019 0.0019 0
99 97 97 0.0094 0.0094 0
100 94 94 0.0027 0.0027 0
100 99 99 0.0094 0.0094 0


10. Animated Flowmaps

plotAnimatedFlowmap(
         hvt_model_output, 
         transition_probability_df, 
         df, 
         animation = "All", 
         flow_map = "All", 
         fps_state,
         fps_time,
         time_duration,
         state_duration,
         cellid_column, 
         time_column )
flowmap_plots <- plotAnimatedFlowmap(hvt_model_output = hvt.results,
                                     transition_probability_df =trans_table,
                                     df = temporal_data, 
                                     animation = NULL , flow_map = 'All',
                                     fps_time = 30, fps_state =  5, 
                                     time_duration = 180,state_duration = 20,
                                     cellid_column = "Cell.ID", time_column = "t")
#> [1] "'animation' argument is NULL"


1. Flow map: Highest transition probability including self-state

The Circle size around the cell’s centroid represents self-state probability. More size, more probability of staying in the same cell.

flowmap_plots[[1]]


2. Flow map: Highest transition probability excluding self-states: Arrow size represents transition probability

The arrow size represents the Probability of the data to move to the next state. And the arrow directions point out to which cell it is moving next.

flowmap_plots[[2]]


3. Flow map animation: Highest state transition probabilities including self-state

The red point moves through the cells as per the time stamp and blinks mean that its stays in the same cell for a particular amount of time which can be referred from the sub-header in the gif.

#flowmap_plots[[3]] 


4. Flow map animation: Highest state transition probabilities excluding self-states

The arrow moves from the current cell to the next cell and its length refers to the transitioning probabilities of the next state without including self state.

#flowmap_plots[[4]]